Skip to content

[Clang] Mark this pointer in destructors dead_on_return#166276

Merged
boomanaiden154 merged 13 commits intollvm:mainfrom
boomanaiden154:clang-destructor-dead-on-return
Feb 6, 2026
Merged

[Clang] Mark this pointer in destructors dead_on_return#166276
boomanaiden154 merged 13 commits intollvm:mainfrom
boomanaiden154:clang-destructor-dead-on-return

Conversation

@boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Nov 4, 2025

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.

This helps to clean up any dead stores that come up at the end of the
destructor. The motivating example was a refactoring in libc++'s
basic_string implementation in 8dae17b
that added a zeroing store into the destructor, causing a large
performance regression on an internal workload.
@boomanaiden154 boomanaiden154 force-pushed the clang-destructor-dead-on-return branch from 7fb6339 to 7a3dec4 Compare December 1, 2025 16:11
@boomanaiden154 boomanaiden154 marked this pull request as ready for review December 1, 2025 17:57
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang labels Dec 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 1, 2025

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Aiden Grossman (boomanaiden154)

Changes

This helps to clean up any dead stores that come up at the end of the destructor. The motivating example was a refactoring in libc++'s basic_string implementation in 8dae17b that added a zeroing store into the destructor, causing a large performance regression on an internal workload. We also saw a ~0.2% performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++ codebase and found a minimal number of issues that all happened to be in user code.


Patch is 5.29 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/166276.diff

109 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+11-1)
  • (modified) clang/test/CodeGen/paren-list-agg-init.cpp (+4-4)
  • (modified) clang/test/CodeGen/temporary-lifetime.cpp (+6-6)
  • (modified) clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/amdgcn-func-arg.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp (+4-4)
  • (modified) clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp (+3-3)
  • (modified) clang/test/CodeGenCXX/for-range.cpp (+12-12)
  • (modified) clang/test/CodeGenCXX/gh62818.cpp (+1-1)
  • (modified) clang/test/CodeGenCXX/nrvo.cpp (+116-116)
  • (modified) clang/test/CodeGenCXX/pr13396.cpp (+2-2)
  • (modified) clang/test/CodeGenCXX/ptrauth-apple-kext-indirect-virtual-dtor-call.cpp (+3-3)
  • (modified) clang/test/CodeGenObjCXX/objc-struct-cxx-abi.mm (+3-3)
  • (modified) clang/test/CodeGenObjCXX/ptrauth-struct-cxx-abi.mm (+1-1)
  • (modified) clang/test/DebugInfo/CXX/bpf-structors.cpp (+1-1)
  • (modified) clang/test/DebugInfo/CXX/trivial_abi.cpp (+1-1)
  • (modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (+3-3)
  • (modified) clang/test/OpenMP/distribute_firstprivate_codegen.cpp (+141-141)
  • (modified) clang/test/OpenMP/distribute_lastprivate_codegen.cpp (+145-145)
  • (modified) clang/test/OpenMP/distribute_parallel_for_firstprivate_codegen.cpp (+188-188)
  • (modified) clang/test/OpenMP/distribute_parallel_for_lastprivate_codegen.cpp (+211-211)
  • (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+24-24)
  • (modified) clang/test/OpenMP/distribute_parallel_for_private_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_firstprivate_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp (+510-510)
  • (modified) clang/test/OpenMP/distribute_parallel_for_simd_private_codegen.cpp (+132-132)
  • (modified) clang/test/OpenMP/distribute_private_codegen.cpp (+64-64)
  • (modified) clang/test/OpenMP/distribute_simd_firstprivate_codegen.cpp (+88-88)
  • (modified) clang/test/OpenMP/distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/distribute_simd_private_codegen.cpp (+108-108)
  • (modified) clang/test/OpenMP/for_firstprivate_codegen.cpp (+40-40)
  • (modified) clang/test/OpenMP/for_lastprivate_codegen.cpp (+202-202)
  • (modified) clang/test/OpenMP/for_linear_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/for_private_codegen.cpp (+33-33)
  • (modified) clang/test/OpenMP/for_reduction_codegen.cpp (+647-311)
  • (modified) clang/test/OpenMP/master_taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/master_taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/master_taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/master_taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/parallel_copyin_codegen.cpp (+70-70)
  • (modified) clang/test/OpenMP/parallel_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/parallel_for_linear_codegen.cpp (+13-13)
  • (modified) clang/test/OpenMP/parallel_master_codegen.cpp (+8-8)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_firstprivate_codegen.cpp (+200-200)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_lastprivate_codegen.cpp (+231-231)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_firstprivate_codegen.cpp (+213-213)
  • (modified) clang/test/OpenMP/parallel_master_taskloop_simd_lastprivate_codegen.cpp (+306-306)
  • (modified) clang/test/OpenMP/parallel_private_codegen.cpp (+49-49)
  • (modified) clang/test/OpenMP/parallel_reduction_codegen.cpp (+113-113)
  • (modified) clang/test/OpenMP/scope_codegen.cpp (+358-358)
  • (modified) clang/test/OpenMP/sections_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/sections_lastprivate_codegen.cpp (+83-83)
  • (modified) clang/test/OpenMP/sections_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/sections_reduction_codegen.cpp (+43-43)
  • (modified) clang/test/OpenMP/simd_private_taskloop_codegen.cpp (+112-112)
  • (modified) clang/test/OpenMP/single_codegen.cpp (+455-455)
  • (modified) clang/test/OpenMP/single_firstprivate_codegen.cpp (+41-41)
  • (modified) clang/test/OpenMP/single_private_codegen.cpp (+30-30)
  • (modified) clang/test/OpenMP/target_has_device_addr_codegen.cpp (+65-65)
  • (modified) clang/test/OpenMP/target_in_reduction_codegen.cpp (+6-6)
  • (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen-1.cpp (+84-84)
  • (modified) clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/target_teams_distribute_lastprivate_codegen.cpp (+118-118)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp (+186-186)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_lastprivate_codegen.cpp (+172-172)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_private_codegen.cpp (+150-150)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+216-216)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_private_codegen.cpp (+194-194)
  • (modified) clang/test/OpenMP/target_teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/target_teams_generic_loop_private_codegen.cpp (+102-102)
  • (modified) clang/test/OpenMP/task_codegen.cpp (+3057-1972)
  • (modified) clang/test/OpenMP/task_in_reduction_codegen.cpp (+57-57)
  • (modified) clang/test/OpenMP/taskloop_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_in_reduction_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/taskloop_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_firstprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_in_reduction_codegen.cpp (+71-71)
  • (modified) clang/test/OpenMP/taskloop_simd_lastprivate_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/taskloop_simd_private_codegen.cpp (+1-1)
  • (modified) clang/test/OpenMP/teams_distribute_firstprivate_codegen.cpp (+68-68)
  • (modified) clang/test/OpenMP/teams_distribute_lastprivate_codegen.cpp (+139-139)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_firstprivate_codegen.cpp (+100-100)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_lastprivate_codegen.cpp (+205-205)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_codegen.cpp (+14-14)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_private_codegen.cpp (+82-82)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+144-144)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_codegen.cpp (+252-252)
  • (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_private_codegen.cpp (+130-130)
  • (modified) clang/test/OpenMP/teams_distribute_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_distribute_simd_firstprivate_codegen.cpp (+98-98)
  • (modified) clang/test/OpenMP/teams_distribute_simd_lastprivate_codegen.cpp (+116-116)
  • (modified) clang/test/OpenMP/teams_distribute_simd_private_codegen.cpp (+106-106)
  • (modified) clang/test/OpenMP/teams_firstprivate_codegen.cpp (+74-74)
  • (modified) clang/test/OpenMP/teams_generic_loop_private_codegen.cpp (+58-58)
  • (modified) clang/test/OpenMP/teams_private_codegen.cpp (+80-80)
  • (modified) clang/test/OpenMP/threadprivate_codegen.cpp (+392-392)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/basic-cplusplus.cpp.expected (+13-13)
  • (modified) clang/test/utils/update_cc_test_checks/Inputs/explicit-template-instantiation.cpp.expected (+3-3)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index efacb3cc04c01..ee6e13fd1c1a5 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2767,7 +2767,8 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   }
 
   // Apply `nonnull`, `dereferenceable(N)` and `align N` to the `this` argument,
-  // unless this is a thunk function.
+  // unless this is a thunk function. Add dead_on_return to the `this` argument
+  // in base class destructors to aid in DSE.
   // FIXME: fix this properly, https://reviews.llvm.org/D100388
   if (FI.isInstanceMethod() && !IRFunctionArgs.hasInallocaArg() &&
       !FI.arg_begin()->type->isVoidPointerType() && !IsThunk) {
@@ -2800,6 +2801,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
             .getAsAlign();
     Attrs.addAlignmentAttr(Alignment);
 
+    if (isa_and_nonnull<CXXDestructorDecl>(
+            CalleeInfo.getCalleeDecl().getDecl())) {
+      auto *ClassDecl = dyn_cast<CXXRecordDecl>(
+          CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
+      if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
+        Attrs.addAttribute(llvm::Attribute::DeadOnReturn);
+      }
+    }
+
     ArgAttrs[IRArgs.first] = llvm::AttributeSet::get(getLLVMContext(), Attrs);
   }
 
diff --git a/clang/test/CodeGen/paren-list-agg-init.cpp b/clang/test/CodeGen/paren-list-agg-init.cpp
index e30777ecc07d6..561bf2b5eb9c4 100644
--- a/clang/test/CodeGen/paren-list-agg-init.cpp
+++ b/clang/test/CodeGen/paren-list-agg-init.cpp
@@ -394,9 +394,9 @@ namespace gh61145 {
   // a.k.a. Vec::Vec(Vec&&)
   // CHECK-NEXT: call void @_ZN7gh611453VecC1EOS0_(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]], ptr noundef nonnull align 1 dereferenceable(1) [[V]])
   // a.k.a. S1::~S1()
-  // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
+  // CHECK-NEXT: call void @_ZN7gh611452S1D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[AGG_TMP_ENSURED]])
   // a.k.a.Vec::~Vec()
-  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
   // CHECK-NEXT: ret void
   template <int I>
   void make1() {
@@ -416,9 +416,9 @@ namespace gh61145 {
   // CHECK-NEXT: [[C:%.*c.*]] = getelementptr inbounds nuw [[STRUCT_S2]], ptr [[AGG_TMP_ENSURED]], i32 0, i32
   // CHECK-NEXT: store i8 0, ptr [[C]], align 1
   // a.k.a. S2::~S2()
-  // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
+  // CHECK-NEXT: call void @_ZN7gh611452S2D1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(2) [[AGG_TMP_ENSURED]])
   // a.k.a. Vec::~Vec()
-  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[V]])
+  // CHECK-NEXT: call void @_ZN7gh611453VecD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[V]])
   // CHECK-NEXT: ret void
   template <int I>
   void make2() {
diff --git a/clang/test/CodeGen/temporary-lifetime.cpp b/clang/test/CodeGen/temporary-lifetime.cpp
index 04087292b2c70..44d1235f15c86 100644
--- a/clang/test/CodeGen/temporary-lifetime.cpp
+++ b/clang/test/CodeGen/temporary-lifetime.cpp
@@ -24,12 +24,12 @@ void Test1() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: }
 
@@ -61,9 +61,9 @@ void Test2() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR2:.+]])
   // CHECK-DTOR: call void @_ZN1AC1Ev(ptr nonnull {{[^,]*}} %[[VAR2:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooIRK1AEvOT_
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR2]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR2]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR2]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[VAR1]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[VAR1]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR1]])
   // CHECK-DTOR: }
 
@@ -155,12 +155,12 @@ void Test7() {
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: call void @llvm.lifetime.start.p0(ptr nonnull %[[ADDR:.+]])
   // CHECK-DTOR: call void @_Z3BazI1AET_v({{.*}} %[[SLOT:[^ ]+]])
   // CHECK-DTOR: call void @_Z3FooI1AEvOT_({{.*}} %[[SLOT]])
-  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr nonnull {{[^,]*}} %[[SLOT]])
+  // CHECK-DTOR: call void @_ZN1AD1Ev(ptr dead_on_return nonnull {{[^,]*}} %[[SLOT]])
   // CHECK-DTOR: call void @llvm.lifetime.end.p0(ptr nonnull %[[ADDR]])
   // CHECK-DTOR: }
   Foo(Baz<A>());
diff --git a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
index 3c2a624bd4f95..e05f8133321c7 100644
--- a/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-automatic-variable.cpp
@@ -75,7 +75,7 @@ int x;
 // CHECK-NEXT:    [[A:%.*]] = alloca [[CLASS_A:%.*]], align 4, addrspace(5)
 // CHECK-NEXT:    [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
 // CHECK-NEXT:    ret void
 //
 void func3() {
diff --git a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
index a5f83dc91b038..bc20c33ec4d0f 100644
--- a/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
+++ b/clang/test/CodeGenCXX/amdgcn-func-arg.cpp
@@ -43,9 +43,9 @@ void func_with_indirect_arg(A a) {
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 [[A_ASCAST]], i64 4, i1 false)
 // CHECK-NEXT:    [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
 // CHECK-NEXT:    call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
 // CHECK-NEXT:    call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[A_ASCAST]])
 // CHECK-NEXT:    ret void
 //
 void test_indirect_arg_auto() {
@@ -61,7 +61,7 @@ void test_indirect_arg_auto() {
 // CHECK-NEXT:    call void @llvm.memcpy.p0.p0.i64(ptr align 4 [[AGG_TMP_ASCAST]], ptr align 4 addrspacecast (ptr addrspace(1) @g_a to ptr), i64 4, i1 false)
 // CHECK-NEXT:    [[AGG_TMP_ASCAST_ASCAST:%.*]] = addrspacecast ptr [[AGG_TMP_ASCAST]] to ptr addrspace(5)
 // CHECK-NEXT:    call void @_Z22func_with_indirect_arg1A(ptr addrspace(5) noundef [[AGG_TMP_ASCAST_ASCAST]])
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) [[AGG_TMP_ASCAST]])
 // CHECK-NEXT:    call void @_Z17func_with_ref_argR1A(ptr noundef nonnull align 4 dereferenceable(4) addrspacecast (ptr addrspace(1) @g_a to ptr))
 // CHECK-NEXT:    ret void
 //
diff --git a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
index 4eafa720e0cb4..a764ba31539eb 100644
--- a/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
+++ b/clang/test/CodeGenCXX/control-flow-in-stmt-expr.cpp
@@ -217,7 +217,7 @@ void ArrayInit() {
   // CHECK:       [[ARRAY_DESTROY_BODY2]]:
   // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %1, %cleanup ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY2]] ]
   // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
   // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr
   // CHECK-NEXT:    br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE2]], label %[[ARRAY_DESTROY_BODY2]]
 
@@ -265,7 +265,7 @@ void ArraySubobjects() {
     // CHECK:       [[ARRAY_DESTROY_BODY]]:
     // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %0, %if.then ], [ %arraydestroy.element, %[[ARRAY_DESTROY_BODY]] ]
     // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
     // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %arr2
     // CHECK-NEXT:    br i1 %arraydestroy.done, label %[[ARRAY_DESTROY_DONE]], label %[[ARRAY_DESTROY_BODY]]
 
@@ -277,7 +277,7 @@ void ArraySubobjects() {
     // CHECK:       [[ARRAY_DESTROY_BODY2]]:
     // CHECK-NEXT:    %arraydestroy.elementPast4 = phi ptr [ %1, %[[ARRAY_DESTROY_DONE]] ], [ %arraydestroy.element5, %[[ARRAY_DESTROY_BODY2]] ]
     // CHECK-NEXT:    %arraydestroy.element5 = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast4, i64 -1
-    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
+    // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element5)
     // CHECK-NEXT:    %arraydestroy.done6 = icmp eq ptr %arraydestroy.element5, [[ARRAY_BEGIN]]
     // CHECK-NEXT:    br i1 %arraydestroy.done6, label %[[ARRAY_DESTROY_DONE2:.+]], label %[[ARRAY_DESTROY_BODY2]]
 
@@ -384,7 +384,7 @@ void NewArrayInit() {
   // CHECK:       arraydestroy.body:
   // CHECK-NEXT:    %arraydestroy.elementPast = phi ptr [ %{{.*}}, %if.then ], [ %arraydestroy.element, %arraydestroy.body ]
   // CHECK-NEXT:    %arraydestroy.element = getelementptr inbounds %struct.Printy, ptr %arraydestroy.elementPast, i64 -1
-  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
+  // CHECK-NEXT:    call void @_ZN6PrintyD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %arraydestroy.element)
   // CHECK-NEXT:    %arraydestroy.done = icmp eq ptr %arraydestroy.element, %0
   // CHECK-NEXT:    br i1 %arraydestroy.done, label %arraydestroy.done{{.*}}, label %arraydestroy.body
 
diff --git a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
index 24b1a4dd42977..af29120feb5bb 100644
--- a/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ b/clang/test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -41,11 +41,11 @@ void glob_delete_A(A *a) { ::delete a; }
 // CHECK: icmp eq ptr %[[a]], null
 // CHECK: br i1
 
-// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-ITANIUM: call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
 // CHECK-ITANIUM-NEXT: call void @_ZdlPvm(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr noundef nonnull align 8 dereferenceable(8) %[[a]])
+// CHECK-MSABI64: call void @"??1A@@QEAA@XZ"(ptr dead_on_return noundef nonnull align 8 dereferenceable(8) %[[a]])
 // CHECK-MSABI64-NEXT: call void @"??3@YAXPEAX_K@Z"(ptr noundef %[[a]], i64 noundef 8)
-// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr noundef nonnull align 4 dereferenceable(4) %[[a]])
+// CHECK-MSABI32: call x86_thiscallcc void @"??1A@@QAE@XZ"(ptr dead_on_return noundef nonnull align 4 dereferenceable(4) %[[a]])
 // CHECK-MSABI32-NEXT: call void @"??3@YAXPAXI@Z"(ptr noundef %[[a]], i32 noundef 4)
 
 struct B {
diff --git a/clang/test/CodeGenCXX/for-range.cpp b/clang/test/CodeGenCXX/for-range.cpp
index 088a34647c374..b9706855f658c 100644
--- a/clang/test/CodeGenCXX/for-range.cpp
+++ b/clang/test/CodeGenCXX/for-range.cpp
@@ -53,7 +53,7 @@ extern B array[5];
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP2]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3:[0-9]+]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP3:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -61,7 +61,7 @@ extern B array[5];
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_array() {
@@ -81,10 +81,10 @@ void for_array() {
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
 // CHECK-NEXT:    call void @_ZN1CC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
 // CHECK-NEXT:    store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2:![0-9]+]]
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef ptr @_Z5beginR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
 // CHECK-NEXT:    store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef ptr @_Z3endR1C(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
 // CHECK-NEXT:    store ptr [[CALL1]], ptr [[__END1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND:%.*]]
@@ -94,12 +94,12 @@ void for_array() {
 // CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 // CHECK:       for.cond.cleanup:
-// CHECK-NEXT:    call void @_ZN1CD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1CD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_END:%.*]]
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -107,7 +107,7 @@ void for_array() {
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_range() {
@@ -127,10 +127,10 @@ void for_range() {
 // CHECK-NEXT:    call void @_ZN1AC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]])
 // CHECK-NEXT:    call void @_ZN1DC1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]])
 // CHECK-NEXT:    store ptr [[REF_TMP]], ptr [[__RANGE1]], align 8
-// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL:%.*]] = call noundef ptr @_ZN1D5beginEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]])
 // CHECK-NEXT:    store ptr [[CALL]], ptr [[__BEGIN1]], align 8
-// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8
+// CHECK-NEXT:    [[TMP1:%.*]] = load ptr, ptr [[__RANGE1]], align 8, !nonnull [[META2]]
 // CHECK-NEXT:    [[CALL1:%.*]] = call noundef ptr @_ZN1D3endEv(ptr noundef nonnull align 1 dereferenceable(1) [[TMP1]])
 // CHECK-NEXT:    store ptr [[CALL1]], ptr [[__END1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND:%.*]]
@@ -140,12 +140,12 @@ void for_range() {
 // CHECK-NEXT:    [[CMP:%.*]] = icmp ne ptr [[TMP2]], [[TMP3]]
 // CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
 // CHECK:       for.cond.cleanup:
-// CHECK-NEXT:    call void @_ZN1DD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1DD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[REF_TMP]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_END:%.*]]
 // CHECK:       for.body:
 // CHECK-NEXT:    [[TMP4:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    call void @_ZN1BC1ERKS_(ptr noundef nonnull align 1 dereferenceable(1) [[B]], ptr noundef nonnull align 1 dereferenceable(1) [[TMP4]])
-// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1BD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[B]]) #[[ATTR3]]
 // CHECK-NEXT:    br label [[FOR_INC:%.*]]
 // CHECK:       for.inc:
 // CHECK-NEXT:    [[TMP5:%.*]] = load ptr, ptr [[__BEGIN1]], align 8
@@ -153,7 +153,7 @@ void for_range() {
 // CHECK-NEXT:    store ptr [[INCDEC_PTR]], ptr [[__BEGIN1]], align 8
 // CHECK-NEXT:    br label [[FOR_COND]]
 // CHECK:       for.end:
-// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
+// CHECK-NEXT:    call void @_ZN1AD1Ev(ptr dead_on_return noundef nonnull align 1 dereferenceable(1) [[A]]) #[[ATTR3]]
 // CHECK-NEXT:    ret void
 //
 void for_member_range() {
diff --git a/clang/test/CodeGenCXX/gh62818.cpp b/clang/test/CodeGenCXX/gh62818.cpp
index ec91b40fca077..f903679cd6b68 100644
--- a/clang/test/CodeGenCXX/gh62818.cpp
+++ b/clang/test/CodeGenCXX/gh62818.c...
[truncated]

@boomanaiden154
Copy link
Contributor Author

The actual change is in the first commit (7a3dec4). I've separated the test changes out into 6ec0952 to hopefully make review a bit easier.

CC @philnik777 Who came up with the idea and requested this.

@rnk
Copy link
Collaborator

rnk commented Dec 1, 2025

This optimization exploits the fact that it's undefined behavior to read from an object after its been destroyed. Given the overall shift in how the industry feels about compilers exploiting undefined behavior, I want to push to add an flag to control this. Think of the people who use -fno-delete-null-pointer-checks. The kinds of people who use that are going to want to disable this kind of optimization. This optimization should absolutely be on-by-default, we'd just have a way to opt out, mentioned in release notes, etc etc.

I'd also like to better understand why base classes matter for this annotation. Until very recently, basic_string used a bunch of compressed pair empty bases instead of [[no_unique_address]], so adding a base class might create a surprising performance regression with the change as written.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, we should do it, this is a valuable optimization. (Commenting twice to push out the inline comments).

.getAsAlign();
Attrs.addAlignmentAttr(Alignment);

if (isa_and_nonnull<CXXDestructorDecl>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code golf: You can CSE the CalleeInfo.getCalleeDecl().getDecl() if you use dyn_cast_or_nonnull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Ended up using dyn_cast_if_present.

CalleeInfo.getCalleeDecl().getDecl())) {
auto *ClassDecl = dyn_cast<CXXRecordDecl>(
CalleeInfo.getCalleeDecl().getDecl()->getDeclContext());
if (ClassDecl->getNumBases() == 0 && ClassDecl->getNumVBases() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to limit this to only destructors of classes with no bases? Whatever the reason (caution, incremental change, etc), comments here would be appreciated.

Copy link
Contributor

@rjmccall rjmccall Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtual base subobjects aren't dead after a base subobject destructor call, but yeah, I can't think of a reason to limit this because of non-virtual bases alone. And even virtual bases are dead after a complete-object destructor call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, just to be conservative while we get implementation experience. I've added a TODO for myself to remove this after we do more testing.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

@boomanaiden154
Copy link
Contributor Author

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to __sanitizer_dtor_callback_fields at the end of the destructor, passing in the this pointer along with the object size. __sanitizer_dtor_callback_fields then modifies the shadow memory, which is a separate object that is not marked dead by this change.

CC @thurstond Who is more familiar with the internals of msan and would be able to confirm.

@boomanaiden154
Copy link
Contributor Author

boomanaiden154 commented Dec 2, 2025

I'd also like to better understand why base classes matter for this annotation. Until very recently, basic_string used a bunch of compressed pair empty bases instead of [[no_unique_address]], so adding a base class might create a surprising performance regression with the change as written.

I was mostly concerned about cases like https://godbolt.org/z/fq4fWfqKv, thinking that we might eliminate stores that would otherwise propagate into the base destructors. I didn't realize that the invocation of the more-base destructor just happens as a call at the end of the derived class destructor, so is a non-issue.

Besides that, I wouldn't mind being conservative initially and making sure that everything ships internally as smoothly as the initial testing that I've done before making this optimization more broad. It doesn't seem like there would be large fallout, but it doesn't seem like there's a downside to doing things incrementally.

@rjmccall
Copy link
Contributor

rjmccall commented Dec 2, 2025

You're correct to be conservative about virtual bases, because they will generally still be live out of the base subobject destructor. I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

@llvmbot llvmbot added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 2, 2025
@nikic
Copy link
Contributor

nikic commented Dec 2, 2025

GCC has a related -fno-lifetime-dse option, possibly we should be integrating with that flag instead of creating a new one? (Notably, LLVM sets that flag because we have some hacks in the User implementation that rely on accessing the object in overloaded operator delete, see #24952).

@thurstond
Copy link
Contributor

Is the potential undefined behavior here checked by msan? Do we need to disable adding this attribute if msan is enabled?

Yes, use-after-destroy is checked by msan, but I don't believe we need to explicitly disable anything here to handle msan. Based on my understand, the msan instrumentation will add a call to __sanitizer_dtor_callback_fields at the end of the destructor, passing in the this pointer along with the object size.

This is my understanding as well.

__sanitizer_dtor_callback_fields then modifies the shadow memory, which is a separate object that is not marked dead by this change.

Yep, shadow memory is opaque to the clang backend - the shadow is just a big blob of memory (~16TB on x86-64), not a bunch of objects.

@boomanaiden154
Copy link
Contributor Author

You're correct to be conservative about virtual bases, because they will generally still be live out of the base subobject destructor. I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

Thanks for confirming. I've constructed https://godbolt.org/z/nKPfhPxsx which I believe demonstrates the problem there (we end up destroying foobar before foo or bar in the whole object destructor, so marking stores to this inside of foobar would change the behavior).

Given you said this is in general, it sounds like there are cases where virtual bases are not live out of the base subobject destructor?

@boomanaiden154
Copy link
Contributor Author

boomanaiden154 commented Dec 2, 2025

GCC has a related -fno-lifetime-dse option, possibly we should be integrating with that flag instead of creating a new one? (Notably, LLVM sets that flag because we have some hacks in the User implementation that rely on accessing the object in overloaded operator delete, see #24952).

Reusing the existing flag name sounds reasonable enough to me, but in #40040 it seems like people had opinions on the flag naming, with @zygoloid suggesting -fno-strict-lifetimes for the opt-out. It looks like a dummy implementation of the flag was proposed in https://reviews.llvm.org/D150930, but never landed. I know in the past we have tried to go for gcc compatibility, so I'm wondering what opinions here are. I think -fstrict-lifetimes makes a lot more sense to users, but it might be good to support the gcc naming. I think it should be pretty simple to support both and just alias them.

Regarding #24952, I guess I did not end up hitting that one because of the intentional conservativeness around base classes (I also didn't do a bootstrapping build with LTO).

@boomanaiden154
Copy link
Contributor Author

I think it's worth doing this for non-virtual bases just to avoid needing an excessive number of controlling flags, since otherwise I assume people might want a staging flag that disables this just for classes with bases.

I've left this as a TODO, mainly so I can make sure that our internal release process doesn't catch many more issues than how I tested this. Then I plan on doing another round of testing without the check on non-virtual bases and then removing the TODO. We don't need more selective flags than this and I don't see why other users couldn't just be pointed at an opt-out flag. Is this strategy okay or would you like to see the non-virtual base check removed before landing?

@github-actions
Copy link

github-actions bot commented Jan 22, 2026

🪟 Windows x64 Test Results

  • 54040 tests passed
  • 2273 tests skipped

✅ The build succeeded and all tests passed.

@github-actions
Copy link

github-actions bot commented Jan 22, 2026

🐧 Linux x64 Test Results

  • 113285 tests passed
  • 4659 tests skipped

✅ The build succeeded and all tests passed.

@boomanaiden154
Copy link
Contributor Author

Bump on this when reviewers get a chance. Thanks!

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; I believe this is correct from a language rules and ABI perspective (but you should get approval from someone who's more involved with Clang these days).

@boomanaiden154
Copy link
Contributor Author

@efriedma-quic @asl Can one of you review this as clang codegen maintainers?

(FYI: I know the CI is failing. There are ~200 tests that need updating and I figured I would do the mechanical updates after people have approved to not pollute the diff for review).

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll want to skim the "mechanical" update to be sure there aren't any unexpected effects. Otherwise LGTM

@boomanaiden154
Copy link
Contributor Author

I'll want to skim the "mechanical" update to be sure there aren't any unexpected effects. Otherwise LGTM

Just pushed the test update. Some of the OpenMP tests had some unrelated diff for tests that used update_cc_test_checks. I've just left it in for now, but can also pull that diff out into a separate PR/omit it altogether if preferred.

// CHECK-NEXT: %[[FPLOAD:.*]] = load ptr, ptr %[[FPGEP]]
// X64-NEXT: %[[CALL:.*]] = call noundef ptr %[[FPLOAD]](ptr noundef nonnull align 8 dereferenceable(8) %[[LPTR]], i32 noundef 3)
// X86-NEXT: %[[CALL:.*]] = call x86_thiscallcc noundef ptr %[[FPLOAD]](ptr noundef nonnull align 4 dereferenceable(4) %[[LPTR]], i32 noundef 3)
// X64-NEXT: %[[CALL:.*]] = call noundef ptr %[[FPLOAD]](ptr noundef nonnull align 8 dead_on_return(8) dereferenceable(8) %[[LPTR]], i32 noundef 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little scary; the call here is actually deleting an array.

I think it might actually be safe to assume here that the array has at least one element, but I'm not sure that generalizes. I'd be more comfortable if the code explicitly checked getDtorType() and excluded Dtor_VectorDeleting.

CC @Fznamznon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Omitted Dtor_VectorDeleting. In the general case we can't know the size of the array statically, so I don't think it makes sense to annotate the vector deleting destructor anyways.

And this would definitely be incorrect for zero-length arrays.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@boomanaiden154 boomanaiden154 merged commit 278fd05 into llvm:main Feb 6, 2026
11 checks passed
@boomanaiden154 boomanaiden154 deleted the clang-destructor-dead-on-return branch February 6, 2026 22:29
@llvm-ci
Copy link

llvm-ci commented Feb 6, 2026

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux running on sanitizer-buildbot8 while building clang at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/31550

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[201/205] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64-with-call.o
[202/205] Generating Msan-aarch64-with-call-Test
[203/205] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64.o
[204/205] Generating Msan-aarch64-Test
[204/205] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:273: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/interception/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:273: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 6347 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.
FAIL: XRay-aarch64-linux :: TestCases/Posix/basic-filtering.cpp (5586 of 6347)
******************** TEST 'XRay-aarch64-linux :: TestCases/Posix/basic-filtering.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
/home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang  --driver-mode=g++ -fxray-instrument   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta    -std=c++11 /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# executed command: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang --driver-mode=g++ -fxray-instrument -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -std=c++11 /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# note: command had no output on stdout or stderr
# RUN: at line 5
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 6
env XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2"  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp 2>&1 |      FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# executed command: env 'XRAY_OPTIONS=patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2' /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# note: command had no output on stdout or stderr
# RUN: at line 11
ls basic-filtering-* | head -1 | tr -d '\n' > /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log
# executed command: ls 'basic-filtering-*'
# note: command had no output on stdout or stderr
# executed command: head -1
# note: command had no output on stdout or stderr
# executed command: tr -d '\n'
# note: command had no output on stdout or stderr
# RUN: at line 12
/home/b/sanitizer-aarch64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp      "/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/basic-filtering-basic-filtering.cpp.tmp.ImfX3E" |      FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# executed command: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp '%{readfile:/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log}'
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# .---command stderr------------
# | /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp:61:15: error: TRACE-NOT: excluded string found in input
# | // TRACE-NOT: - { type: 0, func-id: {{.*}}, function: {{.*filtered.*}}, {{.*}} }
# |               ^
# | <stdin>:10:2: note: found here
# |  - { type: 0, func-id: 1, function: 'filtered()', cpu: 0, thread: 2819755, process: 2819755, kind: function-enter, tsc: 1770418656993186142, data: '' }
Step 9 (test compiler-rt symbolizer) failure: test compiler-rt symbolizer (failure)
...
[201/205] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64-with-call.o
[202/205] Generating Msan-aarch64-with-call-Test
[203/205] Generating MSAN_INST_TEST_OBJECTS.msan_test.cpp.aarch64.o
[204/205] Generating Msan-aarch64-Test
[204/205] Running compiler_rt regression tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:273: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/interception/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/discovery.py:273: warning: input '/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/Unit' contained no tests
llvm-lit: /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 6347 tests, 72 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.
FAIL: XRay-aarch64-linux :: TestCases/Posix/basic-filtering.cpp (5586 of 6347)
******************** TEST 'XRay-aarch64-linux :: TestCases/Posix/basic-filtering.cpp' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 4
/home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang  --driver-mode=g++ -fxray-instrument   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta    -std=c++11 /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# executed command: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/clang --driver-mode=g++ -fxray-instrument -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -std=c++11 /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp -o /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp -g
# note: command had no output on stdout or stderr
# RUN: at line 5
rm -f basic-filtering-*
# executed command: rm -f 'basic-filtering-*'
# note: command had no output on stdout or stderr
# RUN: at line 6
env XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2"  /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp 2>&1 |      FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# executed command: env 'XRAY_OPTIONS=patch_premain=true xray_mode=xray-basic verbosity=1      xray_logfile_base=basic-filtering-      xray_naive_log_func_duration_threshold_us=1000      xray_naive_log_max_stack_depth=2' /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp
# note: command had no output on stdout or stderr
# RUN: at line 11
ls basic-filtering-* | head -1 | tr -d '\n' > /home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log
# executed command: ls 'basic-filtering-*'
# note: command had no output on stdout or stderr
# executed command: head -1
# note: command had no output on stdout or stderr
# executed command: tr -d '\n'
# note: command had no output on stdout or stderr
# RUN: at line 12
/home/b/sanitizer-aarch64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp      "/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/basic-filtering-basic-filtering.cpp.tmp.ImfX3E" |      FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# executed command: /home/b/sanitizer-aarch64-linux/build/build_default/./bin/llvm-xray convert --symbolize --output-format=yaml -instr_map=/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp '%{readfile:/home/b/sanitizer-aarch64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/xray/AARCH64LinuxConfig/TestCases/Posix/Output/basic-filtering.cpp.tmp.log}'
# note: command had no output on stdout or stderr
# executed command: FileCheck /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp --check-prefix TRACE
# .---command stderr------------
# | /home/b/sanitizer-aarch64-linux/build/llvm-project/compiler-rt/test/xray/TestCases/Posix/basic-filtering.cpp:61:15: error: TRACE-NOT: excluded string found in input
# | // TRACE-NOT: - { type: 0, func-id: {{.*}}, function: {{.*filtered.*}}, {{.*}} }
# |               ^
# | <stdin>:10:2: note: found here
# |  - { type: 0, func-id: 1, function: 'filtered()', cpu: 0, thread: 2819755, process: 2819755, kind: function-enter, tsc: 1770418656993186142, data: '' }

boomanaiden154 added a commit that referenced this pull request Feb 8, 2026
…6276)"

This reverts commit 047db15.

The original version of the commit caused assertion failures in DSE.
Those were fixed in ec059d8, so trying
to reland this again.
rishabhmadan19 pushed a commit to rishabhmadan19/llvm-project that referenced this pull request Feb 9, 2026
This helps to clean up any dead stores that come up at the end of the
destructor. The motivating example was a refactoring in libc++'s
basic_string implementation in 8dae17b
that added a zeroing store into the destructor, causing a large
performance regression on an internal workload. We also saw a ~0.2%
performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++
codebase and found a minimal number of issues that all happened to be in
user code.
rishabhmadan19 pushed a commit to rishabhmadan19/llvm-project that referenced this pull request Feb 9, 2026
rishabhmadan19 pushed a commit to rishabhmadan19/llvm-project that referenced this pull request Feb 9, 2026
…m#166276)"

This reverts commit 047db15.

The original version of the commit caused assertion failures in DSE.
Those were fixed in ec059d8, so trying
to reland this again.
Xinlong-Chen pushed a commit to Xinlong-Chen/llvm-project that referenced this pull request Feb 12, 2026
This helps to clean up any dead stores that come up at the end of the
destructor. The motivating example was a refactoring in libc++'s
basic_string implementation in 8dae17b
that added a zeroing store into the destructor, causing a large
performance regression on an internal workload. We also saw a ~0.2%
performance increase on an internal server workload when enabling this.

I also tested this against all of the non-flaky tests in our large C++
codebase and found a minimal number of issues that all happened to be in
user code.
Xinlong-Chen pushed a commit to Xinlong-Chen/llvm-project that referenced this pull request Feb 12, 2026
Xinlong-Chen pushed a commit to Xinlong-Chen/llvm-project that referenced this pull request Feb 12, 2026
…m#166276)"

This reverts commit 047db15.

The original version of the commit caused assertion failures in DSE.
Those were fixed in ec059d8, so trying
to reland this again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.